A Method lor Solid State Cenomc Analysis 
TFXHMC AI I IKI I) 

I his application claims bciici'u from prior Prtn isional Application Serial No. 60 123,362. 
filed March 8, 19^)9. 

I his iin ention relates to the sequeneinL: of 1)N A. More specifically, this in\ ention is a 
method of mappiiiL; the relali\e positions of speei lie segments t)f nucleic acid using scanning 
prohe microscop) . 

BA( KGROLND 

The human genome project is arguably the largest and most important scientific 
collaboration in histor\ . Of more importance is the fact that the human genome project is jusl 
the beginning of llie genome revolution. It is generally accepted that once the sequence of a 
genome is known, it can be "mined" for information that w ill be in\ aluable in deriving useful 
products such as new drugs, genetic medicines, impro\ ed animal and plant produce, and a host of 
others. While current methods are adequate for the human genome project to reach its projected 
completion date earl\ in this millennium, there is ample room for impro\ emenls in technologies 
that would facilitate genome mapping efforts. 

While a small number of important genomes are under anal> sis or lia\ e been full) 
seL|uenced. current methods are costly and limited in their speed. The genomes ol a w ide \ ariely 
of health related and agriculiiirally relc\ ant organisms remain to be explored. I sing current 
meihotlologN to repeat the effort spent on the human genome for every animal and plant that 
remains to be sliuhed would be laborious and e.\tremel> time consuming. It is therefore essential 
that technological impro\ ements in current genome analysis methods be itnented and 
implemeiued to aid in this undertakmg. 



C urrent Teclinolofiy 

The initial ^oliI ofall genome projects is lo acquire ilie highest quaHiy sequence data lor 
the genome being studied. This is acconiphshed b\ determining the nucleotide sequence of 
Crauments of the gentmie, and then assembhng these sequence fragments inio the comptele 
genome sequence. There are no methods in existence for dnect sequencing ol an entire genome 
greater than a few thousand base pairs in a single experiment. 

The current method for sequencing genomes involve first digesting tlie genomes with a 
restriction cndonuclease. The genome is then subcloned intt> a \ ariet\' of v ectors including, but 
not limited to. plasmids, phage \ectors, bacterial artificial chromosomes (BAC's), and yeast 
artificial chromosomes (^'ACs). These fragments are still too large tor direct sequencing, and 
must be further fragmented. The process of re-assembly of all the sequence information 
represented in these fragments is a formidable task. Current methods of genome analysis split 
the ONA (deo\>nbonucleic acid) into many sub-genomic DNA fragments, fhese fragments are 
assembled into contiguous arrays know n as ''contigs." There are tw o general prior art 
approaches to fornniig these contigs. 

One prior art method used to form contigs is to identify nucleotide sequences by creating 
"resiriciion maps'' of DN.A fragments. These DNA fragments can serve lo identify genomic 
fragments and also to identify the overlaps between fragments. A restriction map is a DNA 
prohle that demarcates the positions of target sites for sequence specific restriction 
endonucleases along the length of the DNA. These maps arc generated b\ digestion of the DNA 
w ith a restriction endonuclease aiul liisplay of the digestion products by electrophoretic 
separation on a gel matrix, usually agarose or polyacrx lamide. One ad\ antage to this process is 
that It clearl) defines which members of a large population or "librar>" of gene fragments still 



need 10 be sequenced, lhereb> eliminating undesirable reduntlancy ol" effort. Furthermore, once 
each tVaumcnt lias been mappetl, the maps ihemseK es can be used to determine the order of the 
fragments in the original sample. I his process facilitates their sequential assembU into contigs. 
fhis process pan ides tVagment si/e intbi ination. but must be repeated several times w ith a 
number of variations to allow deduction oi'lhe restriction fragment order in a large ON A sample 
A need exists for a method that u Ml reduce the effon, time and expense of the above method of 
nucleotide sequence mappmg. 

Other methods tor characteri/ing genomic fragments also exist. I- or example, one 
common method knoun in the ai1 as PC'R footprinting uses defined sets of short oligonucleotide 
primers and generates a diagnostic set of PC'R fragments from each genomic piece. 

I he second general prior art approach to genome analysis is to "shotgun" sequence 
randomly selected fragments and attempt to assemble them into the continuous genome sequenc* 
b>' locating sequence ov erlaps. I his requires a large degree of redundancy in the sequencing 
effort. It IS necessary to sequence many- fold more DN.\ than is contained m a single genome to 
insure that as many of the genes as possible have been inclutled in the effort. While this 
approach w orks for small genomes, the requirement for redundancy of effort, coupled with the 
extremelv* low probability of obtaining sequence information for every gene in a genomic hbrarv 
limit Its utility. A neetl exists lor a method that reduces tfie eftbrt necessarv to create these 
genomic libraries. 

Both of these methods are facilitated by the use of physical markers to help identify the 
specific nucleotide sequence and produce a genomic map. The physical markers used can be 
produced in a varietv of w ays and w ith a w itle range in precision. I he markers can be genetic 
loci deduced from classical genetic approaches (e.g., genetic crosses and relative proximity 



anal\ sis) or more direct melhods such as fluorescence /// .s;7// fiybridi/ation anaK sis (I ISH). The 
tbniier process is laborious aiui can be time consuming, especially in the case of slow growing 
organisms or organisms for w hich the genetic manipulation tools are rudinientar\- at best. The 
latter jirocess requires thai prior know ledge about the sequence of the genes under scrutin\ be 
a\ailable. 

It must be noted that for mapping a genome, it is necessary to ha\ e two libraries, each 
constructed using a different restriction endonuclease. fhis waw the fragments in the two 
hbraries will o\ erlap (since the two different restriction endonucleases cut the genome at 
different locations). Thus, by mapping tlie two libraries, and comparing the results, regions of 

10 overlap are discovered and this determines the physical order of the fragments in the genome. 
These fragments can then be sequenced and the entire genomic sequence determined. 
(;ene Fragment Polymorphisms (CFPs) 

In many cases it is of interest to compare DNA sequences from tw o sources. For 
example, in DNA "nngerpriniing" applications one can use small v ariations in the sequence of 

1 - [)N.'\ to tletermine the probabilitv that a particular piece of DN A is deri\ ed from a gi\ en source. 
One method to do this is to compare the positions of target sites for endonucleases that cut D\A 
in a site specific fashion using a restriction endonuclease. If small changes have occurred in the 
defined DNA sequence from two soiuccs, it is likely that the restriction endonuclease site map 
will rctlect this, either by the gam or the loss of one or more sites. These changes are referred to 

:i t as restriction fragment length polymorphisms, or RT LPs. RT LPs are a subset of all types of gene 
fragment pol\ inorphisms, or (il Ps. RFI .P analysis is usuall\' carried out b\ the conveiiiional 
method described abo\ e. a restriction endonuclease digestion, followed b\ gel electrophoresis 

4 



cind Soutlicrn blotting. A need exists for a method ol' nnal\/niii these GI Ps that would reduce 
the litne and labor nnoh ed. as well as the expenditure on reagents required by these steps, 
I unctional Sequence Mappinj^ 

A large portion of genomic DNA does not encode aetn e genes, hi addition, a sigiiil'icant 
portion ol the i'unctional component of a gene is iie\ er transcribed into RNA or used to construct 
a protein. Ho\\ e\ er. these regulatory regions of genes are critical for expression of the gene 
product ami play key roles as. for example, targets for new drugs that regulate le\ els of gene 
expression. To disco\ er which regions are fimctional and w hich are not, w ith regard to gene 
acti\ it>, it is often necessar\' to do a large number of studies w ith large populations of sub 
fragments of the genome. Ihis practice can take years of redundant, laborious, and expensive 
work. 

Scanning IVobe Microscopy and .Atomic Force Microscopy 

A scanning probe microscope (SPM) utilizes a probe which is scanned o\ er a surface. 
The interaction betw een the probe and surface is detected, recorded, and displayctl. If the probe 
is small and kept v ery close to the surface, the resolution of the SPM can be \ cry high, even on 
the atonnc scale in some cases. I'herc is a w ide variety of SPM instruments capable of detecting 
optical, electronic, conductiv e, and other propeiiics. One fonn of SPM. the atomic force 
microscope (AI-.M) is an ultra-sensitive force transduction system. In the AI-M, a sharp tip is 
situated at the enti of a llexible cantilev er and scanned ov er a sample surface. While scanning, 
the cantilever is deflected by the net sum of the attractive and repulsive forces between the tip 
and sample. Il'lhe spring constant of the cantilever is known, the net interaction force can be 
accurately determined from the detlection of the cantilever. The deflection of the cantilever is 
usuallv measured bv the reflection of a focused laser beam from the back of the cantilever onto a 



split photodiodc, consutuliny an "optical lc\ cr" or "beam defection" mechanism. Other 
methods for the detection ot"cantile\ er deflection include inlerteromelr> and pie/oelectric strain 
gaiiLiCS. The lirst ATMs recordetl onl\- the \ ertical displacements of the cantilever. More recent 
methotls invoK e resonating the tip and allow ini: onl) transient contact, or ni some cases no 
^ conlacl at all, hetv\ een it antl the sample, [^lots of tip tiisplacement or resonance changes as it 

traverses a sample surface are used to generate topographic images. Such images ha\ e re\ ealed 
the 3L) structure of a \\ ide variet\ of sample types including material, chcnncal and biological 
specimens. Some examples of the latter include DNA. proteins, chromatin, chromosomes, ion 
channels, and e\ eti li \ ing cells. 
KJ In addition to its imaging capabilities, the Al-M can tlireclly sense and measure forces in 

ihc microNetwon (10 ' ) to picoNew ton (10 ") range. Thus, the AFM can measure forces 
between molecular pairs, and e\ en u ilhm single molecules. Morcov er, the AI-"M can measure a 
w ide \ ariel> of other forces antl phenomena, such as magnetic fields, thermal gradients and 
viscoelaslicU>'. This abilit) can be exploited to map force fields on a sample surface, and re\ eal 
1^ w ith high resolution the location and magnitutle of these ilelds, as in, for example, localizing 
magnetic micropariicles tethered to biomolecular complexes of interest. 

BRIKF I)KS( RIPI ION OF TIIK DRAW INGS 
l igure 1 is a block diagram of the acts that comprise the method of the present im ention. 

SIMMARV 

2i) One embodiment of the present iin enlion relates to a method for determining the order of 

nucleic acid segments from a nucleic acid sample, the method comprising tagging sequence- 
specific sites of the nucleic acid sample with a sequence specific lag, scanning the nucleic acitl 

6 



sample usinsi a scanning probe microscope, and analy/niiz the scan of the nucleic acid sample lo 
deiermine the order ot" nucleic acid segments. 

A method tor comparing DNA tVom two dilTerent sources, the meiliod comprising 
lagging specific segments of a nucleic acid sample from a llrsl source using a sequence specitlc 
5 lag. tagging specitlc segments of llie nucleic acid sample from a second source using a sequence 
specific lag, scanning the lagged nucleic acid sample from the tlrst source using a scanning prohe 
microscope, scanning the lagged nucleic acid sample trom the second source using a scanning 
probe microscope, analwing the scan trom the tlrst source and the scan trom the second source 
using a computer, and comparing the scan from the tlrst source to the second source. 
10 This embodimenl analy/es DN.A by way of example, but the present invention 

conlemplales that an\' type of nucleic acid can be used as a sample in the sequencing melliod. 

An object of this invention is use of SPM technology to identity dellned sequence 
elements m nucleic acids tragments. The SI*M scan further aids in the delcmiination of the order 
of these elements on the nucleic acid of interest. 
1^ Another object of the present iinention is to pn)vide a method for simplifying DNA 

tlngerprinling anal \ sis. 

.'\ further object of the present iin ention is lo pro\ ide a method for simpli lying functional 
mapping of ON A fragnienls. 

^'et another object of the present iin ention is a method for simplif\ ing the mapping of 
20 DNA fragments such as l5AC''s and ^'.AC^'s. 



1 



DI TAII.FD DKSCRIPTION 

The cmbodiiiKMil of the present invention diselosetl herein is a method lor anaK sis ot" 
populations of genomes and genonne fragments using a seannmg probe mieroscope. such as an 
atomic force microscope or near Held optical microscope. The physical maps created by this 
approach constitute genetic "bar codes" that can be used in a wide \ arielv of gene identification 
and characteri/ation apphcatKins. I his method can be uscxl to help the re-assembly of mapped 
DNA fragments back into the correct order. I he Af-M ma> also be used for rapid and precise 
mapping oi'target DNA samples such as cosmids, bacierial artificial cliromosomes (BAC's) and 
\east artificial chromosomes (^'A('s). fhe invention described here also allows stud\' of Rl- LPs, 
length polymorphisms generated by PCR methods, and other forms of GFPs (e.g., single point 
mutations), fhe AfM is used here by way of example, but this does not exclude use of other 
types of SPM instrumentation. 

figure 1 shows the ads that constitute the method of the present in\ention. In this 
embodiment a dipstick is used as the substrate onto w hich the funetionali/ed DNA is bound for 
analysis. This dipstick facilitates the rapid mapping and analysis of sequence specific markers 
bound to large DNA molecules. However, the substrate to which the DNA is tethered is not 
exclusn el> in this form. I he substrate for the tethering surface could be made of any compatible 
material known in the art aiul shaped in any form that can be scanned by the SPM 
instrumentation. 

I he DNA sample is first cut from the source and lineari/ed ( 10). This material is then set 
aside while the dipstick surface is prepared ( 12). fhe dipstick surface is prepared by moditying 
It w iih a chemically reactive functional group so that the DNA can be tethered to the surface for 
e\entual scanning by the SPM ( 14). 1 he next step is modifying the DNA sample w ith the 



appropriate fuiiclional uroup ( 16). Hiis step v\ ill facililatc binding the DNA to the dipsliek 
SLirfaee later ui the method. Once the DNA is funciionally modified, the ON A is then tethered to 
the tlipstick surtaee ( IS). After liie DNA is jiroperl) tethered, it is tagged uith a sequence 
spccitlc tag (20). I lie seL]uence specitlc tag is \shat is read (i.e.. measuretl or tielected) b\ the 
SIVM. Tlic tagged DNA is then dried (22) and aligned in a linear fashion on the dipstick surface. 
DrN ing ensures more stable imaging conditions and, therefore, oplimi/cs data acquisition, 
although a drN ing stop is not absolutely required. The lagged and tethered DNA on the dipstick 
is now scanned using the SP.M instrument (24). In the last step, the readout from the SPM is 
analy/ed (2^)). 

Mail) of the steps of the present iinention are not necessarily specific to the order as laid 
out in the follo\\ ing present embodiment. I his embodiment is giv en by w ay of example. For 
instance, the surface can be prepared and modii'ied after the DNA is functionali/ed instead of 
before. 

C utting/I Jiiearization 

The first step of the method of the present inxention invokes obtaining the DNA sample 
to be analy/ed (10). This is accomplished by cutting and lincan/mg the DNA to be analy/ed. 
I he DNA can be prepared b\ fragmenting the desired genomic DNA and ligating the fragments 
in a typical cloning \ector known to those skilled in the art. The DN A can be e.\cised from the 
source plasmid, cosniid, BAC, \AC or an\ nucleic acid \ ector using Bam H I [Bacillus 
amvloliqucfiicicns II). IxoRl (/: Coli restriclion cndonuclcasc number / ), or an\' comparable 
restriction endonuc lease, or other nuclease. DN.A can also be prepared b\ mechanical metliods 
such as shearing. The reactitm conditions are determined by the choice of eiuionuclease and are 
common knovs ledge to those skilled in the art. The present embodiment utilizes a DN.A sample 



from a bacterial \ inis or phage, tLM-mcd Lanibtia. 1 his DNA sample is 48.502 base pairs long aiul 
constitutes the entire genome of the I .anibtia ptiage. 
Surface Preparation 

I he next act ol the niethtHl is the selection of" a surface (12). I bis surface w ill be the sue 
where the DNA is deposited. The embodiment of the present invention utili/es a dipstick 
substrate to which to bind the DNA sample for scanning. In the present embodiment the dipstick 
IS made of l etlon Other plastics or inert polvmers can be used in alternati\ e cmboiliments. A 
small pad made of mica is attached to one end of the letlon dipstick. Other embodiments ma\ 
incorporate pads matle tif polished silicon or some similar material that is sufficiently Hat to 
allow resolution of DNA fragments by AI-M. Other surfaces include episilicon, highly ordered 
p\Tol\ tic graphic, sapphire, g\ psum, or coating w ith polyst\ rene and other deflneil surface 
coating materials. Any of these surfaces can also be coaled with gold to facilitate formation of 
self assembled monolayers from alkanethiolate solutions. The ad\ antage to this approach 
facilitates presentation of a w ide variel\ of surface chemistries for \ arious applications. 
Surface Modification 

Once the dipstick surface is prepared, then the surface must be modified ( 14) so that it 
w ill react and tether the DN.A for anal\ sis. In the present embotliment the abilil\ of 
alkanethiolates to form robust monolayers on gold surfaces is exploited. The gold is then coated 
with a chenncall) reactive alkanethiolate monolayer the reacti\e portion in this example being 
either a carbox\ 1 or succinimide group. I hese chemicalK acti\ e surfaces then serve as 
attachment points for the modified DNA. 

In this embodiment a molecule containing a sulthydral group at one end. an 1 1 carbon 
alkane chain, and a succinimide group at the other end is used. This molecule is dissoh ed in 

M) 



pure clhanol lo a llnal concentration of IniM. I he iJioUl coated surface is incubated in this 
solution for se\eral hours at room temperature, allou inu a stable monola\er to form. DNA that 
contains an ammo urtnip modillcation at the lennuuis can tlien be immobih/eti on this surface by 
formation of an amide bond between the suceimmide group on the alkane and the prnnary amine 
on the DN A. 

An atK antage of this method is that ii can be used to create hydrophihc domains 
surrounded b\ h\t]rt)phobic tlomains to which the DNA will not have a high affinity in solution. 
This facilitates "Hoating" the DNA a\\a> from the surface during the tagging procedure to 
minimi/e stereochemical hindrance, then deposition of the [)N'A on the surface by virtue of 
dehumidification, u hich abrogates the hydrophobic effect. To create these surfaces, the first step 
is lo produce a uniform monolayer of methyl-terminated (hytirophobic) alkanclhiolates on a gold 
surface. The gold surface can be modified u ilh a pattern such as stripes or clieckcrboard arrays 
b\ e\ aporation of gold through a mask in order to create different areas to w hicli to bind tlie 
DNA. I. A' light is then passed through a mask to oxidi/e the sulfur atom on the alkanetliiolale. 
I his treatment w eakens the sulfur gokl interaction considerably, but only in the those regions 
subjected lo the UV irradiation. Subsequent addition of a succinimidc lerniinaled alkanelhiolate 
results in replacement of the o.xidi/ed iliiolales \s ith the succinimidc terminated molecules and 
creation of a patterned array of chemically active domains to which DN.-\ can be specifically 
tethered. X'ariations of this method are known lo those skilled in llie art. 

Alternative methods for surface moditlcalion include preparing a positivel)' charged 
surface b\ modification of the surface using a silaiie compound containing primar\ amines. 
■Another embothment includes spin coaling a mica or polisheti silicon surface with a preparation 
of polyst vrene. The pol\ st\ rene is prepared b> tlissolution in toluene. Hach one of these 



aliomati\c methods for .surface modi Ileal ion w ill lia\c a resulting allemali\e enihodinieiil for the 
DNA modillcalion of the next step. A functional group must be placed on the DNA that w ill 
correspondingly react with the surface to tether the DNA. The present iineniion contemplates 
utilizing any of these alternati\ e methods. 
DNA Modirication 

I.ineari/ed ON A is modified at one or both ends w ith a reaeti\c group that allows the 
L)N.'\ molecule to be firmly tethered to a surface (1()). In this embodiment, the DNA is modilled 
w ith a primar\' amine group to allow co\*alent bonding to the surface bound succinimide group. 
The lineari/etl DNA contains staggered or "sticky" ends b\ \ irtue of its release from the cloning 
\ector by a restriction endonuc lease, or. in the case tiescribed above (Lambda DNA) the natural 
ends of the Lambda genome are staggered (note that fragments clea\ ed from cosmids at the CX)S 
site w ill ha\ e the same 1 2 nucleotide sticky end as Lambda phage). Since this sequence is 
know n. a complementary molecule containing the amino terminal group is s> nthcsi/ed by 
standard methods known to those skilled in the atl. This material can alternati\el\ be purchased 
from a commercial \ endor of synthetic DNA oligonucleotides. The amino terminated 
oligonucleotide is ligated to the lineari/ed DNA fragment using 14 DNA ligase and standard 
conditions known to those skilled in the art. The ligation product is then rapidl>- purified 
chromatographically and tetheretl to the surface as described below . Other methods know n to 
those skilled m the art can likewise be used to separate large and small DN.\ fragments. 
D>.\ Deposition and l ethcrin^j DNA to the Surface 

Although the details of DNA deposition vary, the present enibtHliment contemplates 
dept>siting a small droplet of DN.-\ solution on a surface and allow ing the DNA to bind to the 
surface through an end-specillc tether (18). 



In the present embodiment, the DNA is dissolved in a sokilion of H) mVt Phosphate 
hiitTer. pH 8.0. C ommon buffers hke Tris are not appropriate because they eoniam pnmary 
amnies. Other priniar> ammes besides tlie one on the DN.-\ fragment can interfere w ith lliis step 
in the method, fhe I)\A is then loaded into a pie/o dri\ en microjet de\ ice (smiilar to that used 
in an ink jel printer, but used for biological deposition) and a microdroplel of the DNA solution 
IS ejected and depositetl on tlie succinimide sinface. l or large DNA fragments this process can 
be too \ igorous and result in shearing of the DNA. In these cases the problem can be sol\ed by 
depositing tlie DN.\ using a pin tool de\ ice. A pin tool is a mechanical dc\ ice u ilh a shaq^ point 
that picks up a small quantity of the DNA solution and deposits it at a desired location by direct 
contact and capillary transfer from the pin tool to tlie surface. 

The DNA is allowed to co\alently bind to the surface by maintaining the humidity above 
6(J"t, RH for an hour at room temperature. Unbound materials are u ashed aw ay and the bound 
material is used for subsequent mapping. Mils is one embodimcnl of the deposition procedure 
and is not meant to exclude other methods for deposition of end-tagged DNA molecules to 
specific locations on a surface. 

While it IS possible to map sequence-specitlc lags on DNA that is not tethered or 
localized on a surface, there are signiUcanl atK antages to ha\ ing the DNA tethered at one end in 
a defined location. This pro\ ides spatial ctu>rdinales to which different DNA molecular species 
are assigned. In this w ay, an array of DNA molecules is analyzed on a single surlace w ithout the 
introduction of an> ambiguit)' regarding the identitlcation of the DNA fragment under scrutiny. 
Iiach molecular species is located at a w ell defined spatial address m the arrav . \\'hen the DNA 
is tethered to a delnied spatial address the Al- VI can be instructed to automatically access tliose 
atldresses sequentiall) . I hese steps may signil'icantl\ reduce search time. 



Sequence Specific 'I'aj'jiin^' of Immobilized DNA 

In the next step the DNA is Uigucd w ith :\ sequence specilie lag (20). In the present 
enibodinienl the DNA is niciibated w ith a niulatcii resti iciion eiKloniiclease in a t>pical 
restriclion endonuelease reaction solution (25 niM bulTer, pH 7,(): niM nioiunalent cation, 
t\pically Na , 10 mM di\*alent cation, typicall\ Mg , (J. 5 niM reducing auent, typically 
dttliiothreitol). The mutant restriction endonuelease has been modified by amino acid 
substitution within its catalytic pocket such that the endonuelease can bind its DNA target site, 
but is incapable of cutting the DNA. This substitution proFiiotcs DNA binding but inhibits 
elea\age. (sec D. Allison, IV Kerper. M. Dokiyc/, J. Spain, P. Modnch, I-. Larimer, T. Thundat, 
and R. Wamiack. Proc. Natl. Acad. Sci. L SA, 1996. 93: p. 8826-8829). This is accomplished by 
genetic engineering methods known to practitioners skilled in the art. 

Wild type restriction en/> nies can also be used by substituting Ci\ or .Mn for Mg' , the 
common catalytic di\ alent cation. In the presence of ('a or Mn , but the absence of Mg , 
many restriction endonucleases w ill bind but not clea\e DNA. This is because Mg ' is required 
for catalytic acti\ ity of the endonuelease. The conditions for binding but not cutting of the DNA 
for each restriction endonuelease used are optimized usmg electrophoretic mobility shift 
experiments, a method whose application for this purpose is common knowledge to those skilled 
in the art. 

1 he restriclion endonuelease tag binds to the DNA, and the surface is quickly rinsed to 
remo\ e spurit)usly bound tag molecules and other debris, such as excess salt. In some cases the 
lag can be llxed in position using I \' light or a crosslinker such as gluteraldeliyde. Other 
endonucleases can be used that ha\e been modified such that they bind tightly to DNA but do not 
cut the DN.A molecule. Those endonucleases that bind tightly to a defmed nucleotide sequence, 

14 



but do iiol cut the DNA largcl, arc suitahlo for gone mapping experiments. Other types of 
materials that can be usetl for tagging specif'ie sequences of the nucleic acid mclude transcription 
factors, nucleotides, modified nucleotides, peptides, functional protein markers, or a duplex, 
triplex or quadruplex forming nucleic acid molecule, or a small molecule conjugatetl to a 
^ nncroparticle or a nanoparlicle. 

The description of the present embodiment tag does not exclude the use of other tags that 
might be incorporated into the method of the present inventit>n. The DNA is generally tethered 
prior to tagging with site specific markers. Ho\ve\er. the sequence of e\ ents can be altered sucli 
that tiie DNA is first tagged, then tethered by the methotls described herein tor anal>sis by AI \1. 
10 Dryinj; and Dcpositinj^ in a [.inear Fashion 

Next, the tethered and taggetl DNA sample must be dried and laid out on the dipstick in a 
linear manner (22). in the present embodiment a stream of low moisture mcrt gas is used. A 
stream of gas, such as argon, is bled from a source, such as a canister. o\ er the dipstick. I his dry 
inert gas carries away an> lefto\ er moisture from the tagging step. I-urthermore, as this dry inert 
15 gas IS bled o\ er tlie dipstick the inert gas lea\es the nucleic acid sample oriented in a uniform 
direction. Having the nucleic acid samples dried in a tlat and uniform direction aids m the 
scanning step of the prt>cess. 

Alternali\ely, drying and linear deposition of the samples (22) can be done by several 
other techniques known in the art. l-or example a number of methods include unidirectional fluid 
20 flow (for linear display) and electromotive force (for linear display). In addition, previous 

reports ha\ e shown that DNA can be alignetl on a surface by slow retraction of a meniscus as the 
DNA IS dried. (For an explanation, see Bensimon. A.. A. Simon, A. Cliiffaudel. V. CYoquette, h. 
Heslol, and D. Bensimon, Science. I'm. 265: p. 2(Wf)-2()<)8.) 

15 



Scanniiiji 

Scaniiiiiu: oClhc Ui^gctl nucleic aciil sample in tlic present enihodnneni is done utili/nig 
an aloinic I'orce microscope (24). I'lie sample is placed in the insinimenl. A Digital 
Instruments, Inc., Dimension 3 I no is useil in this embodiment and is controlletl b\ a computer 
and software generally a\ ailable. The computer controls the operation ol'the tip across the 
dipstick. In the present embodiment the nucleic acid samples ha\ e been tethered to specitic sites 
on the stick. The computer can automat icall> scan these sections and repoil where the tags are 
found, and measure both the contour locations of the lags as w ell as the distance betw een the 
contours. Know ing where the DNA is specifically bound to the surface aids the anahsis of the 
scan. Since the conventional Ai \1 is limitetl m scan field si/e to about I Of) square nncrons. 
knowing where the DNA array is located and the positions of the DNA molecules w ithin the 
array greatly speeds the scanning process, ilierefore. each array of DNA molecules is initiall\ 
located through the use of a pln sical mark such as an indentation or ink spot, then the arra> is 
scanned and the positions oi'the molecules wiihm the arra\ noted. 

The analysis conditii>ns of this embodiment reqinre low humidity because it minimi/es 
potentially destructi\ c tip-sample capillary forces and pro\ ides a more stable DNA specimen. 
'Hie instrument takes data scans of the tlifferent sequence specific tags it has located and then 
leeds this data to the user who can anal\/e the output. In other embotiiments low humiditv 
might not be ilesirable. i.e. using the abov e method and scanning the sample while in solution. 

Because in the present embodiment the DN A is displaved in an ordered arra\ on solid 
state surfaces, the arra> can be processed continuously in the SPM. Through the use of indexing 
markers on the surface of the dipstick, the instrumentation can know precisely the position of the 
current scan, and therefore the sample that is currentl>' being processed. A bar code is assigned 

U) 



10 the know II sample, making lliat DNA fragment imiqiicly iilentitlable thereafter by virtue ol'its 
bar code. 

Analyzinj; the Scan 

Software known b\ those skilled m the art runs the scan b\' the Al-M (2()). This software 
litih/es pattern recognition algoruhnis that direct the nistrunientation to produce a hardcopy 
output. 

hi the present embodiment, llie AFM data is collectetl using commercial softw are 
supphed w itli the instrument. This data is then ported to a separate computer using a softw are 
program called IDIi.AS (NanoStar, Baltimore, .MD). ll)l:AS searches the field and finds 
continuous data profiles tiiat correspond to intact, linear DNA molecules. I HI- AS then measures 
the contour length of the molecules and locates the physical markers comprised of bound I-coRl 
molecules. The softw are then plots these locations as a function of fragment length and 
generates a histogram show ing the probability of finding a physical marker at a given position 
along the length of the DNA molecule. These data are averaged and used to generate a 
diagnostic bar code for that particular D\.\ molecule. B> measuring the distance betw een the 
tags, the length of the DNW fragment betw een the tags can be asccilainetl. I he order in w hich 
the segments appear on the coiitig. can be learned utilizing data from sev eral ditTerent tagged 
nucleic acid samples. 

fhe bar codes represent the nucleic acid fragment. liach bar of the code is w here the 
.•\F'\t has ibiuu] one of the sequence specific tags. The distances between the bars is the distance 
betw een the located lags. By aligning the bars from separate bar cotles. the order of each know n 
fragment can be determined. 



17 



The scan can be aiialwcd by scscral other methods known in the art. The most basic 
method oranalwing the scan is to measure b\' hand the contours thai lune been gi\ en as output 
tVoni the SPM nisirument. 

\ hc aUernati\ e embotliments could utilize a computer program wliich could anaK/e the 
information on the location of tlie sequence specific tags and the distance betueen them. This 
computer then would use an algorithm to place the tVagmenls in the proper order as they appear 
on the original nucleic acid sample. 

An ad\ antage to the present in\ ention is the bar code system of analyzing the scans. 
Utilizing a bar code enables faster ordering of the Iragnients. Before, researchers had to map the 
sequence t^l'the CragmeiU in order to determine the order of the fragments in the sample. I he 
present invention allows the user to utilize the bar code ot'each iragment to align the overlappnig 
fragments in the order the\ appear on the cosmid w iihout ha\ ing to deternnne the sequence of 
large sections of the sample. 
Further Kmbodiments and Advantages 

An alternati\ e embodiment could be to apply this method to comparing DNA sequences 
from two sources. To do this, a sequence speciilc tag would be utilized that would bind onto an 
area where it is thought a nucleotide sequence \ariation might occur. Once the sequence specific 
tags were placed on the D\.-\ fragments utilizing the method described abo\e, the fragments 
would he scanned in the SPVt. If a tag appears on a section of the D\A in both the samples then 
it would increase the likelihood that the samples came from the same source. By choosing the 
sequence specific tag that binds on to an unusual nucleotide sequence, accurate fmgerprmting 
can he done w ithout the long process of restriction iragment length polymorphisms utilizing gel 
electrophoresis ami Southern blotting. 

IS 



A further cnibodiinLMit ofthc present invciuion iinoKes fLinclional sciiuence mapping. 
As discussed aho\ e. a large portion of genomic DNA does not encode actu e genes. L'sing the 
alHue method this itnenlion allows rapid analssis of geiiornes using markers thai specifically tag 
regions in\ol\ed in gene activity. Once the DSA is tagged with these sequence specitlc markers, 
5 the DNA fragment can b\ analyzed h\ the SPM. The location of the sequence specitlc tags will 
report w here the actn e encoding regu)ns are located on the original DN.A fragment. 

.A further embodiment of the present in\ ention in\ oh es itlenti llcation of single nucleotide 
poKmorphisms. It has been suggesietl that indi\ idual humans ha\e a single nucleotide \ arialion 
relative to any other indu idual e\ cry 900 basepairs. These variations can be \ aluable markers 
10 associated w ith genetic traits such as predisposition to disease. The maps generated by the 
method described here can re\ eal single nucleotide polymor]')hisms because sucli a change in 
sequence can preclude or allow binding of llie tag to a particular sequence element. 

The information and examples described herein are for illustrative purposes and are not 
meant to exclude any derivations or alternative methods that arc w ithin the conceptual context of 
1:^ the in\ ention. It is ctintemplated that \ arious deviations can be made to this embodiment w ithout 
de\ latmg from the scope ol'the present in\ entit)n. Accordingly, it is intended that the scope of 
the present iin ention be dictated by the appended claims rather than by the foregoing description 
oi'this embodmient. 



19 



